Search CORE

97 research outputs found

Subgroup Discovery for Defect Prediction

Author: D. Gamberger
N. Lavrač
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Randomly sampling maximal itemsets

Author: Agrawal R.
Borgelt C.
Geurts K.
Lavrač N.
Mitchell-Jones A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Using ILP to Identify Pathway Activation Patterns in Systems Biology

Author: A Subramanian
AL Tarca
C Perlich
D Croft
D Gamberger
JJ Tyson
K Rhrissorrakrai
K Whelan
L Danon
L Dehaspe
L Raedt De
M Holec
MN McCall
MVM França
N Lavrač
N Lavrač
O Kuželka
P Ristoski
PA Flach
R Edgar
R-S Wang
S Draghici
W Kim
W Rongrong
X Robin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist

Crossref

PubMed Central

King's Research Portal

Explore Bristol Research

Mining compact predictive pattern sets using classification model

Author: I Batal
I Batal
I Batal
J Han
L Geng
N Lavrač
R Bellazzi
RC Bone
RP Dellinger
S Brett
T Fawcett
V Jovanoski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper, we develop a new framework for mining predictive patterns that aims to describe compactly the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important for improving the overall class prediction performance. We test our approach on data derived from MIMIC-III EHR database, focusing on patterns predictive of sepsis. We show that using our classification approach we can achieve a significant reduction in the number of extracted patterns compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model

Crossref

Catalogo dei prodotti della ricerca

Dedalo: looking for clusters explanations in a labyrinth of Linked Data

Author: A. Mowshowitz
C. Marinica
C. Shannon
F.A. Lisi
F.A. Lisi
H. Paulheim
H. Paulheim
L. Brisson
L. Geng
L. Moss
M. Dehmer
N. Lavrač
R.D. King
S. Muggleton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present Dedalo, a framework which is able to exploit Linked Data to generate explanations for clusters. In general, any result of a Knowledge Discovery process, including clusters, is interpreted by human experts who use their background knowledge to explain them. However, for someone without such expert knowledge, those results may be difficult to understand. Obtaining a complete and satisfactory explanation becomes a laborious and time-consuming process, involving expertise in possibly different domains. Having said so, not only does the Web of Data contain vast amounts of such background knowledge, but it also natively connects those domains. While the efforts put in the interpretation process can be reduced with the support of Linked Data, how to automatically access the right piece of knowledge in such a big space remains an issue. Dedalo is a framework that dynamically traverses Linked Data to find commonalities that form explanations for items of a cluster. We have developed different strategies (or heuristics) to guide this traversal, reducing the time to get the best explanation. In our experiments, we compare those strategies and demonstrate that Dedalo finds relevant and sophisticated Linked Data explanations from different areas

Crossref

Open Research Online (The Open University)

Improved comprehensibility and reliability of explanations via restricted halfspace discretization

Author: A. An
A. An
D. Gamberger
E. Boros
E. Boros
E. Triantaphyllou
G. Felici
G. Felici
G.A. Miller
G.S. Halford
G.S. Halford
H. Liu
I. Guyon
J. Quinlan
L.A. Kurgan
M. Atzmueller
M. Boullé
M. Boullé
M.R. Chmielewski
N. Cowan
N. Lavrač
P. Perner
S. Bartnikowski
S. Bay
U. Fayyad
V. Vapnik
W. Klösgen
W.-H. Au
Y. Yang
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. A number of two-class classification methods first discretize each attribute of two given training sets and then construct a propositional DNF formula that evaluates to True for one of the two discretized training sets and to False for the other one. The formula is not just a classification tool but constitutes a useful explanation for the differences between the two underlying populations if it can be comprehended by humans and is reliable. This paper shows that comprehensibility as well as reliability of the formulas can sometimes be improved using a discretization scheme where linear combinations of a small number of attributes are discretized

CiteSeerX

Crossref

Finding a short and accurate decision rule in disjunctive normal form by exhaustive search

Author: A. Zoghbi
G. E. Andrews
G. I. Webb
I. H. Witten
J. A. Kors
J. Demšar
J. R. Quinlan
J. R. Quinlan
Jan A. Kors
N. Lavrač
P. Clark
Peter R. Rijnbeek
R. Galen
R. Holte
S. Viaene
S. Weiss
T. Dietterich
T. Mitchell
U. Fayyad
U. Rückert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2010
Field of study

Greedy approaches suffer from a restricted search space which could lead to suboptimal classifiers in terms of performance and classifier size. This study discusses exhaustive search as an alternative to greedy search for learning short and accurate decision rules. The Exhaustive Procedure for LOgic-Rule Extraction (EXPLORE) algorithm is presented, to induce decision rules in disjunctive normal form (DNF) in a systematic and efficient manner. We propose a method based on subsumption to reduce the number of values considered for instantiation in the literals, by taking into account the relational operator without loss of performance. Furthermore, we describe a branch-and-bound approach that makes optimal use of user-defined performance constraints. To improve the generalizability we use a validation set to determine the optimal length of the DNF rule. The performance and size of the DNF rules induced by EXPLORE are compared to those of eight well-known rule learners. Our results show that an exhaustive approach to rule learning in DNF results in significantly smaller classifiers than those of the other rule learners, while securing comparable or even better performance. Clearly, exhaustive search is computer-intensive and may not always be feasible. Nevertheless, based on this study, we believe that exhaustive search should be considered an alternative for greedy search in many problems

Crossref

Erasmus University Digital Repository

Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

Author: A Bhattacharyya
A Szabóová
A Szilágyi
Andrea Szabóová
CJC Burges
CO Pabo
DH Ohlendorf
DW Hosmer
EW Stawiski
Filip Železný
G Nimrod
J Moreland
Jakub Tolar
L Breiman
N Bhardwaj
N Lavrač
Ondřej Kuželka
R Caruana
R Sathyapriya
S Ahmad
S Jones
S Jones
T Cathomen
T Hastie
Y Mandel-Gutfreund
Y Tsuchiya
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

Author: Boggia M
Boros E
Cabrera-Diego LA
Doucet A
EACL workshop on News Media Content Analysis and Automated Report Generation
Freiental L
Koloski B
Kranjc J
Krustok I
Lavrač N
Leppànen L
Linden C-G
Martinc M
Moreno J
Paju T
Pelicon A
Podpečan V
Pollak S
Pranjić M
Purver M
Robnik-Šikonja M
Salmela S
Sheehan S
Shekhar R
Toivonen H
Traat S
Ulčar M
Zosa E
Škrlj B
Žnidaršič M
Publication venue
Publication date: 19/04/2021
Field of study

Queen Mary Research Online

Transductive Learning for Spatial Data Classification

Author: A. Appice
A. Frank
A. Gammerman
A. Mukerjee
D. Malerba
D. Malerba
D. Malerba
D. Malerba
D. McIver
F. Esposito
G. Góra
J. Han
J. Sander
J.A. Robinson
K. Koperski
K.P. Bennett
L. Džeroski
L. Raedt De
L. Raedt De
M. Ceci
M. Ceci
M. Ceci
M. Ester
M. Krogel
M. Kukar
M.-A. Krogel
M.J. Egenhofer
N. Lavrač
P. Legendre
R.S. Michalski
S. Muggleton
S. Shekhar
S. Shekhar
S. Shekhar
T. Joachims
T. Joachims
T. Mitchell
V. Vapnik
V. Vapnik
W. Klösgen
Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented

Crossref

Archivio istituzionale della ricerca - Università di Bari

Kent Academic Repository